Skip to content

feat: Lazy identity-flag evaluation in local-eval mode#200

Merged
emyller merged 6 commits intomainfrom
feat/lazy-identity-flags
Apr 28, 2026
Merged

feat: Lazy identity-flag evaluation in local-eval mode#200
emyller merged 6 commits intomainfrom
feat/lazy-identity-flags

Conversation

@khvn26
Copy link
Copy Markdown
Member

@khvn26 khvn26 commented Apr 24, 2026

Adds lazy identity-flag evaluation to local-eval mode, addressing the regression reported in #198.

Flagsmith.get_identity_flags returns a Flags that holds the evaluation context plus a precomputed segment-overrides reverse index, and resolves each feature on first access by handing a trimmed context (the queried feature plus only the segments that could override it) to engine.get_evaluation_result. The engine remains the only public API the SDK consumes — identity-key enrichment, multivariate hashing, percentage-split rules and override-priority handling all stay in the engine.

The reverse index is rebuilt inside the _evaluation_context setter, so it stays in sync with environment refreshes with zero hot-path cost.

Bench against real customer env (Sevenrooms QA, 438 features, 23 segments)

Hot loop of get_identity_flags(...).is_feature_enabled(name), 20,000 iterations, single process, single identity.

Version mean p50 p99 p99.9 max
flagsmith 3.8.0 (flag-engine 5.4.3) 434 µs 408 µs 726 µs 2.59 ms 25.7 ms
flagsmith 5.2.0 (flag-engine 10.0.4) 442 µs 403 µs 572 µs 4.52 ms 42.4 ms
this PR 2.71 µs 2.54 µs 3.83 µs 27.0 µs 183 µs

100–200× across every percentile vs any released version, on the customer's actual environment shape.

Back-compat

  • Flags public surface unchanged (is_feature_enabled, get_feature_value, get_flag, all_flags).
  • FlagResult produced via engine.get_evaluation_result — identical output shape to the previous bulk-eval path.
  • Flagsmith.__init__ signature unchanged.

Engine contract

Untouched. SDK only imports flag_engine.engine.get_evaluation_result and the SegmentContext type alias.

Tests

  • 8 new unit tests in test_models.py (override match/no-match, per-flag caching, all_flags materialisation, fallthrough to default handler, reverse-index correctness).
  • 1 new integration test in test_flagsmith.py: get_identity_flags defers eval; touching one flag triggers exactly one engine call against a trimmed context (queried feature only).
  • 94 passing, mypy strict clean, black/isort clean.

Follow-ups (not in this PR)

A separate optimisation skips the env-doc re-parse on no-change refreshes via the x-flagsmith-document-updated-at response header — it eliminates the ~5 ms p99 GIL stall the polling thread otherwise imposes every refresh interval. That'll come as its own PR to keep this one focused.

@khvn26 khvn26 requested a review from a team as a code owner April 24, 2026 20:10
@khvn26 khvn26 requested review from gagantrivedi and removed request for a team April 24, 2026 20:10
Copy link
Copy Markdown
Member

@gagantrivedi gagantrivedi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if I'm overthinking it, but this pull request certainly makes me feel conflicted. On one hand, it does make optimisations that make sense; on the other, it blurs the boundary between engine and client that we worked very hard to establish.

Comment thread flagsmith/models.py

return flag

def _resolve_flag(self, feature_name: str) -> Flag:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lazy path skips identity-key enrichment (multivariate + percentage_split break)

flagsmith/models.py:265–311 — Flags._resolve_flag

  • L274–275: grabs context = self._context and overrides_index = self._overrides_index straight from the lazy Flags state.
  • L280: feature_context = context["features"][feature_name] — reads the feature off the raw per-identity context that was stored in from_evaluation_context.
  • L287–295: walks the segment overrides index, calls is_context_in_segment(context, segment_context) (L288). This is where PERCENTAGE_SPLIT rules silently fail —
    context_matches_condition in the engine falls back to _get_identity_key(context) (engine evaluator.py:301), and that key is never set on this context.
  • L297–308: passes the same un-enriched context to get_flag_result_from_context. Inside the engine (evaluator.py:179, key = _get_identity_key(context)), key is
    None, so the if key is not None and (variants := ...) branch (engine evaluator.py:183) is skipped — multivariate always returns the control value.

flagsmith/flagsmith.py:431–456 — _get_identity_flags_from_document

  • L439–443: builds the per-identity context via map_context_and_identity_data_to_context(...).
  • L444: branches on self.lazy_identity_evaluation.
  • L448–456 (lazy branch): hands context straight to Flags.from_evaluation_context — no get_enriched_context call.
  • L457–459 (eager branch): goes through engine.get_evaluation_result(context=context), which does call get_enriched_context at engine.py evaluator.py:60.

flagsmith/mappers.py:88–99 — map_context_and_identity_data_to_context

  • L93–98: returns {**context, "identity": {"identifier": identifier, "traits": ...}}. Note no key. The eager path's enrichment fills this in; the lazy path
    doesn't.

Maybe we should consider bringing back the testing harness we had that used to cover all SDKs but was limited to just the engine during the last context value work — to avoid issues like this.

Copy link
Copy Markdown
Member Author

@khvn26 khvn26 Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — all this fixed by using get_evaluation_result against a trimmed context in 5a7d7dc.

Maybe we should consider bringing back the testing harness we had that used to cover all SDKs but was limited to just the engine during the last context value work — to avoid issues like this.

I'll have a think on this... there's a bit of a chicken & egg problem when it comes to remote evaluation.

Comment thread flagsmith/models.py Outdated
for override in segment_context.get("overrides") or ():
if override["name"] != feature_name:
continue
priority = override.get("priority", float("inf"))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider import DEFAULT_PRIORITY from engine instead here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now are — this is once again the engine's concern as per 5a7d7dc.

@khvn26 khvn26 marked this pull request as draft April 28, 2026 09:57
khvn26 added 3 commits April 28, 2026 11:21
``get_identity_flags`` now returns a ``Flags`` that holds the
evaluation context plus a precomputed segment-overrides reverse index,
and resolves each feature on first access via the engine primitives
(``is_context_in_segment`` + ``get_flag_result_from_context``) rather
than running a full bulk evaluation up-front.

In environments shaped like the Slack-report customer (420 features,
30 CSV-IN segments, hot loop reading one boolean flag) this takes
``get_identity_flags().is_feature_enabled(name)`` from ~430 µs to
~1.85 µs per call; 200-segment envs go from ~1200 µs to ~2 µs. The
``.all_flags()`` materialisation path is never slower than the
eager baseline in the bench matrix.

Back-compat:
  * ``Flags`` public API unchanged (``is_feature_enabled``,
    ``get_feature_value``, ``get_flag``, ``all_flags``).
  * ``FlagResult`` construction reuses the same engine helper as the
    bulk path — identical output shape.
  * New ``lazy_identity_evaluation`` constructor kwarg, default
    ``True``, lets operators flip back to the eager path if they hit
    an unexpected regression.

Engine contract is untouched: the SDK consumes only already-public
``flag_engine.segments.evaluator`` symbols.

beep boop
Picks up the IN segment-condition evaluation speedup
(Flagsmith/flagsmith-engine#295), which cuts per-IN-condition latency
on segment walks by roughly 30%. Complementary to the lazy identity
evaluation added in this PR — most customer envs will benefit from both.

beep boop
…on_result

Per review: keep the engine/client boundary intact and let the engine
handle all evaluation correctness — instead of reaching into
``is_context_in_segment`` / ``get_flag_result_from_context`` directly,
``Flags._resolve_flag`` now builds a *trimmed* context (the queried
feature plus only the segments that could override it, looked up via
the precomputed reverse index) and hands it to the engine's public
``get_evaluation_result``.

Side effects:
  * Identity-key enrichment now runs on the lazy path (the engine's
    ``get_enriched_context`` is invoked internally), so multivariate
    splits and ``PERCENTAGE_SPLIT`` rules behave correctly. Previously
    the lazy path silently degraded these.
  * Override-priority handling moves back into the engine — the
    ``float("inf")`` literal is gone from the SDK.
  * ``Flags.all_flags`` switches to a single bulk
    ``get_evaluation_result`` rather than calling ``_resolve_flag`` per
    feature; cheaper, and matches the eager path's call shape.

Trim cost is ~0.8 µs per call, so the lazy path is now ~2.6 µs mean /
~3.4 µs p99 against the customer's prod env (438 features, 23
segments) — still 150–220× faster than the eager path on every
percentile, and 100% routed through the engine's documented public API.

beep boop
@khvn26 khvn26 force-pushed the feat/lazy-identity-flags branch from 05f6304 to 5a7d7dc Compare April 28, 2026 10:24
khvn26 added 3 commits April 28, 2026 11:29
Lazy is now the only path through ``_get_identity_flags_from_document``.
Per-flag resolution still goes via the engine's public
``get_evaluation_result`` (against a trimmed context), so callers get
the perf win without the eager fallback or the kwarg surface.

Tests pinned to the old eager path are reworked to mock
``flagsmith.models.engine.get_evaluation_result`` instead — that's
where the call lands now — and trigger evaluation via ``.all_flags()``
when they need the bulk-context call shape.

beep boop
Replaces the module-level helper with three fixtures:

  * ``lazy_context_factory`` — keyword-only callable producing an
    evaluation context, for tests that need a non-default shape.
  * ``lazy_context`` — a default context (identity matches the
    segment override).
  * ``lazy_flags`` — a ``Flags`` built from the default context.

Tests now request whichever fixture suits their case instead of
calling the helper directly. While here, mark each test body with
Given / When / Then comments to match the rest of the file.

beep boop
@khvn26 khvn26 marked this pull request as ready for review April 28, 2026 10:40
@khvn26 khvn26 requested a review from gagantrivedi April 28, 2026 10:43
Copy link
Copy Markdown
Contributor

@emyller emyller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. Left a few non-blocking comments.

Comment thread flagsmith/flagsmith.py
PipelineAnalyticsProcessor
] = None
self._evaluation_context: typing.Optional[SDKEvaluationContext] = None
self.__evaluation_context: typing.Optional[SDKEvaluationContext] = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: possible, but unharmful, misuse os Python name mangling.

Comment thread flagsmith/models.py
Comment on lines +186 to +190
In lazy mode, the caller has signalled they want every flag, so
we run the bulk evaluator once on the full context and copy the
results into the per-flag cache. Cheaper than asking the engine
for each feature one at a time.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: This comment, and a few similar others, read a bit like deixis. This SDK exports no such thing as "lazy mode" — seemingly, it's the only mode now.

Comment thread flagsmith/models.py
Comment on lines +238 to +242
if (
self._context is not None
and self._overrides_index is not None
and feature_name in (self._context.get("features") or {})
):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (
self._context is not None
and self._overrides_index is not None
and feature_name in (self._context.get("features") or {})
):
if (
and self._overrides_index is not None
and feature_name in (self._context.get("features") or {})
):

@emyller emyller dismissed gagantrivedi’s stale review April 28, 2026 17:57

Because we need to release this sooner rather than later, I am dismissing @gagantrivedi's previous request for changes, but will leave threads open for any follow-ups.

@emyller emyller merged commit f7750bb into main Apr 28, 2026
6 checks passed
@emyller emyller deleted the feat/lazy-identity-flags branch April 28, 2026 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants